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Abstract 

Background: Phylogeographic composition of M. tuberculosis populations reveals associations between lineages and 
human populations that might have implications for the development of strategies to control the disease. In Latin America, 
lineage 4 or the Euro-American, is predominant with considerable variations among and within countries. In Colombia, 
although few studies from specific localities have revealed differences in M. tuberculosis populations, there are still areas of 
the country where this information is lacking, as is a comparison of Colombian isolates with those from the rest of the world. 

Principal Findings: A total of 414 M. tuberculosis isolates from adult pulmonary tuberculosis cases from three Colombian 
states were studied. Isolates were genotyped using IS6 7 7 O-restriction fragment length polymorphism (RFLP), spoligotyping, 
and 24-locus Mycobacterial interspersed repetitive units variable number tandem repeats (MIRU-VNTRs). SIT42 (LAM9) and 
SIT62 (HI) represented 53.3% of isolates, followed by 8.21% SIT50 (H3), 5.07% SIT53 (T1), and 3.14% SIT727 (HI). Composite 
spoligotyping and 24-locus MIRU- VNTR minimum spanning tree analysis suggest a recent expansion of SIT42 and SIT62 
evolved originally from SIT53 (T1). The proportion of Haarlem sublineage (44.3%) was significantly higher than that in 
neighboring countries. Associations were found between M. tuberculosis MDR and SIT45 (HI), as well as HIV-positive 
serology with SIT727 (HI) and SIT53 (T1). 

Conclusions: This study showed the population structure of M. tuberculosis in several regions from Colombia with a 
dominance of the LAM and Haarlem sublineages, particularly in two major urban settings (Medellfn and Cali). Dominant 
spoligotypes were LAM9 (SIT 42) and Haarlem (SIT62). The proportion of the Haarlem sublineage was higher in Colombia 
compared to that in neighboring countries, suggesting particular conditions of co-evolution with the corresponding human 
population that favor the success of this sublineage. 
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Introduction 

Tuberculosis (TB) continues to be a challenge to control. 
Although widespread and common efforts have had an impact in 
achieving declining numbers in global incidence for the first time 
in history, TB still causes 8.7 million new cases and 1.4 million 
deaths per year [1]. 

The worldwide population structure of Mycobacterium tuberculosis 
has been defined, linking specific lineages to human populations. 
Using comparative genomics and large sequence polymorphisms 



(LSPs), six phylogeographic lineages have been described and 
associated with human populations around the world [2]. For 
example, the East-Asian lineage is dominant in many countries of 
the Far East, while the Indo-Oceanic lineage occurs all around the 
Indian Ocean. The Euro-American lineage is clearly the most 
frequent lineage in Europe and the Americas. The relationships 
between these lineages and human populations are supported not 
only by studies with isolates from around the world, but also by the 
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tendency of each lineage to cause the disease in populations in 
specific urban cosmopolitan settings [3-6]. 

Genotyping techniques based on repetitive elements such as 
restriction fragment length polymorphism (RFLP) using IS6110 
[7], spoligotyping of clustered repetitive interspersed short 
palindromic repeats (CRISPR) [8], and mycobacterial inter- 
spersed repetitive units variable number tandem repeats (MIRU- 
VNTRs) [9], have been used to support epidemiological studies as 
well as to define the population structure in M. tuberculosis [10-12]. 
Spoligotyping and MIRU-VNTR have replaced the standard 
IS61 10-KFW due to the ease of implementation and standard- 
ization, and the availability of international databases for making 
comparisons [10,13,14]. These latter techniques have demonstrat- 
ed a concordance with the assignation of major lineages as defined 
by LSPs [15-17], despite being subject to convergent evolution 
[18]. 

Colombia is the third most populated country in Latin America. 
The country has nearly 47 million inhabitants and has changed 
from being mainly a rural population at the beginning of the 20 th 
century to a mostly urban population in the 21 th century. The 
increase in urban population together with crowding and poor 
living conditions in the outskirts of major cities maintain favorable 
conditions for TB transmission. Despite the country's efforts to 
control the disease, the estimated incidence is 34 per 100,000 
individuals in the population, corresponding to approximately 
16,000 new cases per year [19], which poses a major task for the 
public health control of the disease. The overall incidence of TB 
for Colombia hides the dissimilarities between regions, reflecting 
differences in control measures as well as differences in transmis- 
sion dynamics. These situations in turn, should influence the 
relationship established between human and M. tuberculosis 
populations. 

Several studies have demonstrated the distribution of M. 
tuberculosis lineages and sublineages in Latin American countries, 
confirming the overwhelming predominance of the Euro-Amer- 
ican lineage but with considerable variation in the distribution of 
sublineages or clades between and within countries [20-25]. In 
Colombia, studies performed on few specific locations also show a 
predominance of Euro-American lineages with differences among 
localities [26,27]. 

The aim of this study was to further assess the distribution of M. 
tuberculosis lineages and sublineages in Colombia and to gain a 
better understanding of the dynamics of the disease. M. tuberculosis 
isolates were obtained from patients with pulmonary tuberculosis 
from three different regions of Colombia. All isolates were 
genotyped by using ZSY57 2 0-RFLP, spoligotyping, and 24-locus 
MIRU-VNTR. Then, associations between the main M. tubercu- 
losis sublineages and the demographic and epidemiologic charac- 
teristics of patients were evaluated. The discriminatory power of 
the different genotyping methods was also calculated. 

Methods 

Ethics statement 

All study procedures were approved by the Ethics Review 
Boards of the participating institutions who were in charge of 
recruiting the patients: Universidad de Antioquia, Centro Inter- 
national de Entrenamiento e Investigaciones Medicas, CIDEIM, 
and Universidad del Cauca. All patients had a signed written 
consent previously approved by the ethics committee. When 
patients were less than 18 years old an informed written and 
signed consent was obtained with the additional approval and sign 
of one of the parents. All sign consents were kept in physical files 
locked under the custody of principal investigators to maintain the 



anonymity of patients. The study was also approved by regional 
and local health authorities in: Antioquia state and Medellin city, 
Valle del Cauca state and Cali city and Cauca state and Popayan 
city. 

Study population 

M. tuberculosis isolates were obtained from index tuberculosis 
patients belonging to three cohorts followed in three different cities 
in Colombia (Medellin, Cali, and Popayan) from March 2005 to 
2008. These patients were part of a previous study performed in 
the same cities, were we studied factors associated with TB 
transmission among household contacts of patients with pulmo- 
nary tuberculosis [28]. 

Index cases were included consecutively from urban populations 
in cohorts from Medellin and Cali, whereas the smallest cohort 
included cases from Popayan as well as from smaller towns in 
Cauca state. An index case was included if the patient was older 
than 1 5 years and had at least one household contact as described 
previously [28]. Index cases were initially diagnosed based on 
clinical symptoms, signs, and chest-X rays, and confirmed by acid- 
fast bacilli (AFB) sputum examination using the Zielh-Nelsen stain, 
at the local health facility. A second sputum specimen was 
processed and cultured at the research laboratory designated in 
each city. Sputum samples were decontaminated with NaOH and 
N-acetyl-L-cysteine [29], cultured on an MGIT system (MGIT 
960®) and/or solid Lowenstein-Jensen (Lf) culture media. 
Identification of AFB-positive cultures was performed by pheno- 
typic methods such as niacin, nitrate and 68°C catalase tests [29]. 
Drug susceptibility testing for first line anti-TB drugs was 
performed using the proportion method in LJ [29] . M. tuberculosis 
isolates were frozen in 50% glycerol at — 70°C until use. One 
isolate obtained from one AFB-positive smear sputum per patient 
was used for genotyping. 

M. tuberculosis genotyping 

Isolates were genotyped using spoligotyping, IS61 1 0-RFLP, and 
24-locus MIRU-VNTRs. For IS6110-RFLP genotyping, a stan- 
dard protocol was used following international recommendations, 
which included a DNA extraction protocol [7,30]. Spoligotyping 
was performed following standard procedures [8], using a 
commercial source for membranes and reagents (Isogen Life 
Science, De Meern, the Netherlands). 

MIRU-VNTR genotyping was performed using polymerase 
chain reaction (PCR) amplification of a standard set of 24 MIRU- 
VNTR loci with primers specific for the flanking regions of each 
VNTR region, and the detection of amplified PCR products was 
carried out by electrophoresis. From the gel images, the 
corresponding MIRU-VNTR bands were interpreted as copy 
numbers based on a reference table [9] . Two of the participating 
laboratories took part of the first and second multicenter 
proficiency studies of the Global Network for the Molecular 
Surveillance of Tuberculosis using MIRU-VNTR genotyping 
[31]. 

The role of participating laboratories was as follows: Myco- 
bacteriology laboratory at University of Cauca in charge of 
culturing and identifying M. tuberculosis from patients in Popayan 
and surrounding towns (Cauca state). Mycobacteriology laborato- 
ry at Cideim was in charge of culturing and identifying M. 
tuberculosis from patients in Cali (Valle state) and performed 24- 
locus MIRU-VNTR in those isolates. Mycobacteriology labora- 
tory at National Institute of Health in Bogota was in charge of 
performing drug susceptibility tests and genotyping by IS6 110- 
RFLP, spoligotyping and 24-locus MIRU-VNTR to isolates from 
Cauca and Valle states. Mycobacteriology laboratory at CIB 
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performed culture and identification of isolates from patients from 
Medellin (Antioquia state) as well as genotyping using IS6110- 
RFLP, spoligotyping and 24-locus MIRU-VNTR. 

Clustering analysis, allelic diversity, and discriminatory 
power 

IS611 0-RFLP, spoligotyping, and 24-locus MIRU-VNTRs 
results were analyzed using the BioNumerics software version 
6.6 (Applied Maths. Sint-Martens-Latem, Belgium) to establish the 
relationships between different isolates of M. tuberculosis. Patterns of 
IS61 1 0-RFLP were digitized and similarities were calculated using 
the Dice coefficient. MIRU-VNTR and spoligotyping data were 
entered as character type and analyzed using the categorical 
coefficient. Similarity trees and dendrograms were calculated using 
the unweighted pair group method with arithmetic averages 
(UPGMA). A cluster was defined as two or more M. tuberculosis 
isolates with identical patterns. The MIRU-VNTR allelic diversity 
(A) at a given locus was calculated asA= 1— «2 [(n/n— 1)], where 
xi is the frequency of the rth allele at the locus and n is the number 
of isolates [9,32]. To determine the discriminatory power (DP) of 
each genotyping method or a combination thereof, the Hunter- 
Gaston Discriminatory Index (HGDI) was calculated [33]. 

Minimum spanning trees (MST) were created using the 
Bionumerics software (Version 6.60) to explore the evolutionary 
relationship among spoligotyping and 24-locus MIRU-VNTR 
isolates. Spoligoforest trees were drawn to determine the parent- 
to-descendant spoligotypes in the group of isolates studied using 
the Fruchterman-Reingold algorithm and a hierarchical layout 
using the SpolTools software [http://www.emi.unsw.edu.au/ 
spolTools, [34]], and reshaped and colored using the GraphViz 
software [http:/ / www.graphviz.org] . 

Lineage assignation and comparison with an 
international database 

Spoligotypes in binary format were converted to an octal code 
for comparison with the SITVIT2 proprietary database of Institut 
Pasteur de la Guadeloupe, which is an updated version of the 
previously released SpolDB4 and SITVITWEB databases [10,13]. 
At the time of the analysis, SITVIT2 contained genotyping 
information on about 110.000 M. tuberculosis clinical isolates from 
160 countries of origin. In this database, a Spoligotype Interna- 
tional Type (SIT) represents a spoligotyping pattern shared by 2 or 
more patient isolates, as opposed to "orphan," which does not 
match with another pattern in the SITVIT2 database. Major 
phylogenetic clades were assigned according to spoligotype 
signatures and using revised SpolDB4/ SITVITWEB rules 
[10,13]. The sublineage distribution in cities from this study was 
also compared with those from two other cities of Colombia 
(Buenaventura and Bogota), for which data were available in the 
SITVIT2 database. We also compared the distribution of the 
predominant SITs in the present study with the available data for 
3 neighboring countries (Venezuela, Brazil, and Peru) in the 
SITVIT2 database. 

Descriptive statistics were used to show the distribution of 
lineages and SITs per cohort of patients. STATA version 12 
(STATA Corp. USA) was used for statistical analysis. Association 
of clades and SITs with demographic and epidemiological 
characteristics (human immunodeficiency virus [HIV] serology, 
sex), susceptibility to first-line drugs, and number of IS61 1 f-RFLP 
copies, as well as differences in distribution according to cohorts, 
were calculated using Pearson's Chi-square test when more than 
80% of the data had values greater than 5 and Fisher's Exact Test 



for the remaining data with smaller values (where at least 20% of 
data had values less than 5). 

All study procedures and written consent forms were approved 
by the Ethics Review Boards of the participating institutions. 

Results 

Four hundred and fourteen M. tuberculosis isolates were studied 
from index cases included in three cohorts followed in three 
different Colombian cities for a period of three years (2005 to 
2008). The median age of the patients was 39.1 years (range 15 to 
96 years), and 42.8% of them were female. Bacillus Calmette- 
Guerin (BCG) vaccination was confirmed in 75.6% of patients and 
1.8% were sero-positive for HIV. Most of the isolates (75.1%) were 
pan-susceptible to anti-TB drugs, 12.1% exhibited some drug 
resistance, and 4.6% were resistant to both isoniazid and rifampin 
(multi-drug resistant, MDR). 

A total of 84 spoligotypes were identified; these included 20 
orphan patterns that have not yet been reported to the SITVIT2 
database (Table 1). The other 64 patterns matched a preexisting 
shared type in the database (50/64 SITs containing 374 isolates) or 
created a new shared type (14/64 SITs containing 20 isolates) 
within this study or after a match with a previously reported 
orphan in the SITVIT2 database (Table 2). Furthermore, 25 out 
of 64 pre-existing SITs containing 355 isolates were clustered (2 to 
124 isolates per cluster), corresponding to 85.75% of all isolates. 
The number of unclustered isolates was 59 (39 isolates with unique 
SITs plus 20 orphan isolates) out of 414, or 14.25%. 

SIT 42 (LAM9) with 124 isolates and SIT 62 (HI) with 97 
isolates represented 29.9% and 23.4% of the total isolates, 
respectively (Table 2). These two SITs accounted for 3.78% and 
17.7% of the isolates when compared with the total number of 
isolates in the SITVIT2 database, and together represented more 
than 10% of the isolates in South America, North America, and 
Southern Europe. In contrast, SIT207 (H3) with 8 isolates and 
SIT727 (HI) with 13 isolates represented 25.8% and 34.2% of the 
isolates in the SITVIT2 database; these SITs have been reported 
mostly in South America and North America (Table 3, see also 
table S3 for comparison of sublineages distribution with neigh- 
boring countries). 

MSTs were constructed based on spoligotype patterns and 24- 
locus MIRU-VNTRs. Figure 1A shows MSTs based on 
spoligotypes in which two major groups were evident and 
belonged to the Haarlem and LAM sublineages, and included 
most of the isolates (SIT62 and SIT42, respectively, were the most 
frequent in these lineages). Other isolates were grouped as the ill- 
defined T sublineage (with SIT 53 as the most frequent) and X 
sublineage (with SIT91 as the most frequent). More distance was 
evident among isolates that integrate with the Haarlem sublineage 
than in those integrating with the LAM sublineage. In contrast, 
when MSTs were constructed using 24-locus MIRU-VNTR, 
isolates belonging to LAM appeared more distant than those 
integrating with the Haarlem sublineage. However, 24-locus 
MIRU-VNTRs better grouped isolates into major lineages such 
as LAM, Haarlem, S, T, and X; unique isolates belonging to the 
African sublineage and East African-Indian sublineage were 
clearly separated (Figure IB). MSTs combining spoligotyping 
and MIRU-VNTR are shown in Figure 1C. There was agreement 
in the manner in which every genotyping method grouped isolates 
in the major sublineages. The 24-locus MIRU-VNTR analysis of 
common SITs (SIT42, SIT62, and SIT50) clearly showed that 
they are composed of very closely related isolates, which were 
distinguished by only one or two allele changes. Spoligoforest trees 
generated by means of the Fruchterman-Reingold algorithm and 
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Figure 1. Minimum spanning tree (MST) illustrating evolutionary relationships between M. tuberculosis spoligotypes identified in 
our study (A). MST constructed with spoligotyping. (B) MST constructed with 24-locus MIRU-VNTR (C) Composite MST with spoligotyping and MIRU- 
VNTRs markers. MST were constructed on all isolates (n = 414, including 20 orphan patterns). The phylogenetic tree connects each genotype based 
on degree of changes required to go from one allele to another. The structure of the tree is represented by branches (continuous vs. dashed and 
dotted lines) and circles representing each individual pattern. Note that the length of the branches represents the distance between patterns while 
the complexity of the lines (continuous, gray dashed and gray dotted) denotes the number of allele/spacer changes between two patterns: solid 
lines, 1 or 2 or 3 changes (thicker ones indicate a single change, while the thinner ones indicate 2 or 3 changes); gray dashed lines represent 4 
changes; and gray dotted lines represent 5 or more changes. The size of the circle is proportional to the total number of isolates in our study, 
illustrating unique isolates (smaller nodes) versus clustered isolates (bigger nodes). The color of the circles indicates the phylogenetic lineage to 
which the specific pattern belongs. Note that orphan patterns are circled in orange. Patterns colored in yellow indicate a strain with an unknown 
signature (unclassified). 
doi:1 0.1 371 /journal.pone.0093848.g001 



hierarchical layout (Figure S2 A and B) confirmed the dominance 
of SIT42 (LAM) and SIT62 (Haarlem). The SIT42 (LAM) cluster 
was the largest node evolved from SIT53 (Tl), from which 
multiple spoligotypes arose. The second largest spoligotype SIT62 
(HI) appears to derive originally from SIT53 (Tl) and more 
recently from SIT50 (H3), finally giving rise to a lower amount of 
SITs. 

An evolutionary MST based on spoligotypes as a function of 
several associated characteristics showed significant differences 
between predominant SITs (>2%) and drug resistance (unknown 
not included) (/><0.001) (Figure SI). It is worth noting that all 
strains belonging to SIT45 (HI) were MDR (6 out of 6). The 
difference between predominant SITs (>2%) and the three higher 
IS6110-RFW copy number (8, 9, and 11 bands) was also 
significant (p = 0.011). Significant differences were found between 
predominant SITs and HIV-positive serology (p = 0.044). The 
proportion of HIV-positive patients was greater among isolates 
belonging to SIT727 (HI) (2 HIV-positive out of 13) and SIT53 
(Tl) (2 HIV-positive out of 19)(Figure SI). Unknown HlV-status 
was not included in the analysis. No significant differences were 



noted when comparing the sex ratios of all predominant SITs (p> 
0.5). 

The phylogeographical distribution of M. tuberculosis lineages 
around Colombia is shown in Figure 2A. Our data showed that 
LAM represented 39.6% of the isolates from Medellin (Antioquia 
State), 39.1% of the isolates from Cali (Valle state) and 24.0% of 
the isolates from several towns in Cauca state. The Haarlem 
sublineage was found to comprise 48.7% of the group of isolates 
from Medellin and 39.0% of the group of isolates from Cali, but 
was not represented in isolates from Cauca. In contrast, the ill- 
defined T sublineage makes up 40.0% of the isolates from Cauca, 
including the town of Popayan (the main city), as compared to the 
proportion of isolates from this sublineage in Medellin (6.4%) and 
Cali (11.0%). No isolates belonging to Beijing sublineage were 
identified in our study. Genotyping data available in the SITVIT2 
database from other two cities in Colombia (Buenaventura and 
Bogota), showed a predominance of isolates belonging to the LAM 
and Haarlem sublineages. Other lineages such as T, X, and S are 
less represented in these two cities with the exception of the Beijing 
sublineage which was frequent in the city of Buenaventura. 
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Figure 2. Phylogeographical distribution of M. tuberculosis sublineages identified in our study and MST according to patients' 
origin grouped by cities or states. (A) The Map shows the cities of our study and others with their corresponding pie representing the proportion 
of M. tuberculosis sublineages; Distribution of sublineages among strains belonging to the 3 sites of study (Antioquia, Valle del Cauca and Cauca) 
vs. strains contained in international database SITVIT2 for cities Bogota and Buenaventura. (B) MST illustrating evolutionary relationships between M. 
tuberculosis spoligotypes in our study in function of states. The cities have been located into their corresponding states: Medellin is located in 
Antioquia; Cali is located in Valle del Cauca; and Popayan, Caldono, Morales, El Tambo, and Piendamo are located in Cauca state. The map was 
obtained from http://www.uxabilidad.com/recursos/mapa-politico-de-colombia-envectores.html which is available as a public domain. 
doi:1 0.1 371 /journal.pone.0093848.g002 



An MST based on spoligotypes and the state in which the 
isolates were obtained (Figure 2B) revealed a close evolutionary 
relationship with the main spoligotypes found in this study, which 
were LAM (SIT42) and Haarlem sublineages (SIT62) mostly in 
the states of Antioquia and Valle. In contrast, isolates from Cauca 
state were more distandy related even among the most frequently 
identified, the ill-defined T sublineage. The only exception was 
SIT53 (Tl), which was found in greater proportion in Cauca state 
than in other areas of the country. Furthermore, the difference in 
the sublineage distribution of isolates from the three states of the 
country reported in this study was significant (/7<0.001). The 
analysis based on 24-locus MIRU-VNTR showed that 52.3% of 
isolates from Medellin were grouped into 40 clusters (2 to 29 
isolates per cluster), 9.4% of isolates from Cali were grouped into 3 
clusters (2 isolates each) and 8% of isolates from Popayan and 
surrounding towns were grouped in one cluster (2 isolates). 

Comparative DP was calculated for the three genotyping 
methods used in this study. The method with the highest DP 
(0.9916) was 24-locus MIRU-VNTR, followed by IS6110-RFW 



(0.9868) and then spoligotyping (0.8414). The DP obtained using 
the combination of the three genotyping methods was slightly 
higher than that observed for 24-locus MIRU-VNTRs (0.9918 vs. 
0.9916). We also evaluated different combinations of MIRU- 
VNTRs and calculated their corresponding DP. Eight-locus 
MIRU-VNTR with a allelic diversity greater than 0.6 showed a 
discriminatory power of 0.9771, while 15-locus MIRU-VNTR 
with the highest allelic diversity, showed a DP of 0.9855 slightly 
above of the recommended set of 15-locus MIRU-VNTRs [35], 
that showed a discriminatory power of 0.9847 (see Table SI). The 
allelic diversity for the different MIRU-VNTR loci was evaluated 
using Hunter-Gaston diversity analysis. Locus QUBllb showed 
the greatest allelic diversity with a diversity index of 0.780 (CI 
0.767-0.793), whereas locus 20 showed the lowest diversity index 
0.033 (CI 0.009-0.056) (see Table S2). 

Discussion 

This study presents a phylogeographic panorama of the M. 
tuberculosis population structure in Colombia. The isolates analyzed 
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from three states showed the LAM and Haarlem clades as being 
dominant, grouping 82.8% of them mostly in urban settings 
(Medellin and Cali cities). Other studies carried out in Colombia, 
based on spoligotyping and deposited in the SITVIT2 database, 
showed the same predominance of the Haarlem and LAM 
sublineages. One of these studies was in Bogota, the major urban 
setting of Colombia, in which the proportion of the LAM, 
Haarlem, and T clades were 49.3%, 25.0%, and 13.8% 
respectively [26]. This study also reported SIT42 (LAM9), 
SIT62 (HI), and SIT53 (Tl) as the major clusters comprising 
45.8% of isolates. By contrast, our present study showed the same 
SITs (42, 62, 53) comprising 58.4% of isolates, of which SIT42 
(LAM9) and SIT62 (HI) accounted for 53% of isolates. 

The LAM, Haarlem, ill-defined T, X, and S sublineages belong 
to the Euro-American lineage or lineage 4, one of the six major 
lineages for M. tuberculosis that have been described around the 
world [2,36,37]. This lineage, although present in several regions 
of the world, is predominant in Europe and America. In Latin 
America, this lineage is dominant and have been reported by 
several studies with considerable variation among countries. In our 
study 37.4% of isolates belonged to LAM. This sublineage 
appeared to be most common in Brazil (46%), Venezuela (53%), 
and Peru (28.3%). In the three countries that share borders with 
Colombia, the proportion of Haarlem is quite variable; our data 
shows a proportion of 41% among the studied isolates, in contrast 
to those seen in Venezuela (5%), Brazil (12%), and Peru (28%) 
[20-23,25]. 

When comparing the main SITs found in our study with their 
frequencies in neighboring countries, there is a significant 
difference in the proportion of SIT42 (LAM9): 29.9% for our 
study versus 1 1.8% for Venezuela, 8.8% for Brazil, and 5.6% for 
Peru (based on the SITVIT2 database). The differences are more 
striking in the case of SIT62 (HI), which is one of the two most 
endemic in Colombia, when compared to the same neighboring 
countries, with 23.4% of isolates belonging to this SIT versus 
0.53%, 0.02%, and 0.08% for Venezuela, Brazil, and Peru, 
respectively (SITVIT2 database) (see Table S3). 

Contrary to the sublineage distribution observed in isolates from 
the main urban settings, those obtained from Cauca state were 
grouped predominandy (40%) in the ill-defined T sublineage, with 
no isolates belonging to Haarlem. The clear difference among the 
distribution of sublineages in Cauca state compared to that in the 
urban settings of Valle and Antioquia (Cali and Medellin) might be 
explained by the smaller group of isolates studied, and by the 
human origin of these isolates. Most of the cases from Cauca state 
were from patients living in smaller urban and rural areas located 
in the south of the country, which is characterized by a higher 
proportion of indigenous population. These facts suggest differ- 
ences in transmission conditions as well as host factors that 
ultimately may affect the successful establishment of a particular 
M. tuberculosis lineage in a given human population. 

Analysis of M. tuberculosis isolates from Buenaventura city (V alle 
state), a seaport in the South Pacific coast of Colombia, have 
identified isolates belonging to the LAM and Haarlem sublineages, 
but also isolates belonging to the Beijing sublineage (SITVIT2 
database). This was an unusual finding compared to our study, in 
which no Beijing isolates were identified. This sublineage, 
although described for the first time in 1998 in this Colombian 
seaport city, has only been reported since then from patients whose 
origin is from this same city, or patients with the same origin but 
diagnosed in the country's inland major urban settings [38-40] . In 
agreement with these data, despite the human migration from 
Asia, where the Beijing isolates are very frequent, they represent a 



proportion about 5% or less of isolates in Latin America, 
according to several reports [20-25]. 

There was significant association between MDR-TB and HIV 
status with particular spoligotypes. For example, six isolates 
belonging to SIT45 (HI) were MDR. Analysis of these isolates 
using 24-locus MIRU-VNTRs revealed that they were very closely 
related, but only grouped four of them into two clusters. This 
finding may represent a particular transmission focus, because all 
were isolates from patients in Medellin, rather than showing a 
particular predisposing trend of this spoligotype to develop as 
MDR. No clear association has been found in terms of 
predominant lineages or sublineages and MDR among different 
studies in Latin America [25,41,42]. Despite this, the Beijing 
sublineage has been associated with a high proportion of drug 
resistant isolates in several parts of the world [43,44], including 
Colombia [27]. Moreover, the M strain (Haarlem 2) has been 
linked to large MDR-TB outbreaks in Argentina [45]. 

MST analysis provided a detailed picture of genetic distances 
among M. tuberculosis isolates studied based on spoligotyping and 
MIRUs. Using both genotyping methods facilitated a better 
assignment of the major and dominant groups belonging to the 
Euro-American sublineages LAM and Haarlem (lineage 4). This 
was in agreement with previous studies that demonstrated the 
utility of this approach in assigning clades and sublineages, 
particularly within the Euro- American lineage [15]. The hierar- 
chical layout and Fruchterman-Reingold analysis based on 
spoligotyping interestingly showed that SIT42 (LAM9) and 
SIT62 (HI), the more conspicuous SITs found in this study, were 
derived initially from SIT 53 (Tl) and lately evolved to the more 
dominant type. The reason behind the expansion of these 
particular SITs in the studied isolates and populations, over co- 
existing non-dominant SITs, might suggest a conjunction of social 
changes such as accelerated population growth in impoverished 
sub-urban settings facilitating the transmission of the disease, with 
mosdy still unknown microbe characteristics that allowed the 
adaptation of particular M. tuberculosis isolates to specific human 
populations. An example of successful M. tuberculosis isolates in a 
particular population was published recently, linking the success of 
some of them to phenotypic characteristics such as slower growth 
in monocytes and the ability to elicit a less inflammatory response 
[46]. 

A more detailed look at the spoligotyping and 24-locus MIRU- 
VNTRs composite MST for SIT42 (LAM9) and SIT62 (HI) 
showed a MIRU-VNTR multiplicity of clusters: 12 clusters in 
SIT62 and 15 clusters in SIT42, along with unique isolates. Most 
isolates belonging to these two spoligotypes were very closely 
related, since they were differentiated by one or two MIRU- 
VNTR allele changes, supporting the notion that recent expansion 
and evolution of these groups of isolates have occurred in 
accordance with the rate of mutation calculated for MIRU- 
VNTR [17]. The analysis based on 24-locus MIRU-VNTR 
showed that there was a greater percentage of clustering in isolates 
from Medellin (52.3%) than in Cali (9.4%) and Popayan and 
surrounding towns (8.0%). This suggests a more active and 
ongoing transmission in Medellin (the largest group of isolates 
studied) than in the other two areas. 

A practical utility of the major discriminatory power of the 24- 
locus MIRU-VNTR set over spoligotyping and IS61 1 0-RFLP is its 
use as an epidemiological marker to distinguish between a diversity 
of isolates, including those associated with specific transmission 
chains. This is particularly useful in settings with high endemicity 
and disease transmission, as observed in one of urban settings 
studied. Supporting the epidemiological use of MIRU-VNTR in 
the population studied is the finding that most of the clusters 
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identified by this method were circumscribed to one of the two 
urban centers studied. In addition, we found that MIRU23, 
ETRB, and Mtub34, which are excluded from the recommended 
15-locus MIRU-VNTR for epidemiological studies [35], had 
allelic diversity index values higher than 0.5. This finding might 
lead us to consider the use of these loci in different genotyping 
studies, especially in areas with M. tuberculosis lineage distribution 
similar to that observed in this study. 

In summary, this study shows the distribution of M. tuberculosis 
lineages and sublineages in several regions in Colombia, with an 
important dominance of LAM and Haarlem belonging to lineage 
4, particularly in two major urban settings (Medellin and Cali). 
Two dominant spoligo types were LAM9 (SIT42) and Haarlem 1 
(SIT62). The use of 24-locus MIRU-VNTR showed the best 
discriminatory power and proved useful in epidemiological studies 
in which the Euro-American lineage is prevalent. The proportion 
of the Haarlem sublineage was higher in Colombia compared to 
that in neighboring countries, suggesting the presence of particular 
conditions of co-evolution with the corresponding human popu- 
lation that favor the success of this sublineage. 

Supporting Information 

Figure SI A minimum spanning tree (MST) illustrating 
evolutionary relationships between the M. tuberculosis 
spoligotypes in our study in function of studied param- 
eters. (A) Drug resistance; (B) IS61 70-RFLP; (C) HIV Serology; 
(D) Sex ratio. Difference between predominant SITs (>2%) 
including SIT45 vs. Drug resistance (Code 0 Unknown not 
included) is very significant (p<0.001); note that all strains 
belonging to SIT45/H1 are MDR. The difference between 
predominant SITs>2% and the 3 Major IS6110 RFLP No of 
Bands (8, 9 and 11) is significant (with a p-value = 0.01 1). The 
difference between predominant SITs and HIV serology is 
significant (p = 0.044), note that the proportion of HIV positive 
patients is more visible among strains belonging to SIT727/H1 
(number of HIV positive = 2/13) and SIT53/T1 (n = 2/19). 
Missing HIV status values have not been taken into account. No 
significance difference was observed when comparing sex ratios of 
all predominant SITs (p value>0.5). 
(TIF) 

Figure S2 A representation of parent to descendant 
spoligotypes within our study sample (n = 414 isolates) 
as seen through Spoligoforest trees drawn using the 
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